78 research outputs found
CASENet: Deep Category-Aware Semantic Edge Detection
Boundary and edge cues are highly beneficial in improving a wide variety of
vision tasks such as semantic segmentation, object recognition, stereo, and
object proposal generation. Recently, the problem of edge detection has been
revisited and significant progress has been made with deep learning. While
classical edge detection is a challenging binary problem in itself, the
category-aware semantic edge detection by nature is an even more challenging
multi-label problem. We model the problem such that each edge pixel can be
associated with more than one class as they appear in contours or junctions
belonging to two or more semantic classes. To this end, we propose a novel
end-to-end deep semantic edge learning architecture based on ResNet and a new
skip-layer architecture where category-wise edge activations at the top
convolution layer share and are fused with the same set of bottom layer
features. We then propose a multi-label loss function to supervise the fused
activations. We show that our proposed architecture benefits this problem with
better performance, and we outperform the current state-of-the-art semantic
edge detection methods by a large margin on standard data sets such as SBD and
Cityscapes.Comment: Accepted to CVPR 201
KCRC-LCD: Discriminative Kernel Collaborative Representation with Locality Constrained Dictionary for Visual Categorization
We consider the image classification problem via kernel collaborative
representation classification with locality constrained dictionary (KCRC-LCD).
Specifically, we propose a kernel collaborative representation classification
(KCRC) approach in which kernel method is used to improve the discrimination
ability of collaborative representation classification (CRC). We then measure
the similarities between the query and atoms in the global dictionary in order
to construct a locality constrained dictionary (LCD) for KCRC. In addition, we
discuss several similarity measure approaches in LCD and further present a
simple yet effective unified similarity measure whose superiority is validated
in experiments. There are several appealing aspects associated with LCD. First,
LCD can be nicely incorporated under the framework of KCRC. The LCD similarity
measure can be kernelized under KCRC, which theoretically links CRC and LCD
under the kernel method. Second, KCRC-LCD becomes more scalable to both the
training set size and the feature dimension. Example shows that KCRC is able to
perfectly classify data with certain distribution, while conventional CRC fails
completely. Comprehensive experiments on many public datasets also show that
KCRC-LCD is a robust discriminative classifier with both excellent performance
and good scalability, being comparable or outperforming many other
state-of-the-art approaches
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
We propose MinVIS, a minimal video instance segmentation (VIS) framework that
achieves state-of-the-art VIS performance with neither video-based
architectures nor training procedures. By only training a query-based image
instance segmentation model, MinVIS outperforms the previous best result on the
challenging Occluded VIS dataset by over 10% AP. Since MinVIS treats frames in
training videos as independent images, we can drastically sub-sample the
annotated frames in training videos without any modifications. With only 1% of
labeled frames, MinVIS outperforms or is comparable to fully-supervised
state-of-the-art approaches on YouTube-VIS 2019/2021. Our key observation is
that queries trained to be discriminative between intra-frame object instances
are temporally consistent and can be used to track instances without any
manually designed heuristics. MinVIS thus has the following inference pipeline:
we first apply the trained query-based image instance segmentation to video
frames independently. The segmented instances are then tracked by bipartite
matching of the corresponding queries. This inference is done in an online
fashion and does not need to process the whole video at once. MinVIS thus has
the practical advantages of reducing both the labeling costs and the memory
requirements, while not sacrificing the VIS performance. Code is available at:
https://github.com/NVlabs/MinVI
- …